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1 Introduction 



Risk theory is the part of insurance mathematics that is concerned with stochas- 
tic models for the flow of payments in an insurance business. The purpose of an 
insurance is in general to level out fluctuations in the cost for the policyholder 
and to replace the often strongly varying cost with a more predictable flow of 
payments. To achieve this, a large group of risks - a "collective" - is created 
in which the costs of an individual member can be highly stochastic, but where 
the total cost is levelled out as a consequence of the law of large numbers. 

In these lecture notes we will describe some basic natural models for "risk pro- 
cesses" and derive various types of asymptotic laws for the fluctuations in the 
amount of loss. We will also investigate how the fluctuations depend on vari- 
ables such as reserve capital, premium amount, reinsurance arrangements, size 
of the collective and distribution of the included variables. Models for both life 
and property insurance will be considered. 

One can distinguish between two different types of risks: the insurance risk and 
the uncertainty concerning the future returns from the collected reserve capital. 
These notes will mainly be concerned with the former type of risk, which is gen- 
erally better known from a statistical point of view because it changes slower 
over time so that observed losses can be expected to be relevant in predicting 
future losses. Also, an important difference between the risk types is that un- 
certainty in for instance the development of the interest can not be levelled out 
in the same way as the first type of risk, since it can not be decomposed as a 
sum of many contributions, obeying the law of large numbers. However it is 
of interest to model the influence of both risk types and indeed the substantial 
development of finance mathematics during the last years has resulted in sev- 
eral models for financial risks. In recent research these models are combined 
with models from traditional risk theory in an interesting way, and new types 
of contracts are being analyzed. 

Risk theory as a branch of probability has a long tradition, particularly within 
Swedish insurance research. Some of the models that we will be interested 
in were formulated already in the beginning of the 20th century in works by 
Filip Lundberg and Harald Cramer, and the theory of ruin probabilities that 
we will consider was developed in the 1930-50's by Cramer, Esscher, Segerdahl 
and Arfwedson among others. This research inspired the development of the 
theory for stochastic processes, and during the 1960-80's it has turned out that 
many problems in queuing theory, storage theory and risk theory are closely 
related and can be solved by the same methods. This has resulted in several 
simplifications of the theory in that technically complicated analytical methods 
have been replaced by probabilistic techniques which are more intuitive. In 
these notes we will, as far as possible, use these probabilistic methods. 



1 



2 Stochastic models for the total amount of loss 
during a fixed period 

In risk theory there are two basic models for the amount of loss in an insurance 
collective: the individual model and the collective model. Both these models are 
described in this section. We also derive approximations for tail probabilities 
for the distribution of the total amount of loss. 

2.1 The individual risk model 

In this model we consider a (large) number of individual policies - for instance 
we can think of whole life assurances - that are in effect during, let's say, one 
financial year. For each of the policies there is a (small) probability pi that a 
loss occurs, and a probability qi = 1 — pi that no loss occurs. If a loss occurs the 
amount Xi is payed to the policyholder, where Xi is specified in the agreement. 
The losses are assumed to be independent. Let {Mi} be independent Bernoulli- 
variables with P{Mi = 1) = 1 — P(Mi = 0) = pi. Then the individual amount 
of loss can be written as XiM t and the total loss is given by X := ^ ji XiM i . 
Since the total loss is a sum of independent random variables, it is natural to 
define its distribution via the generating function E[e^ x ], which is the product 
of the individual generating functions, that is, 



The mean and variance of the individual losses are Eja^Mj] = XiPi and Var(a;jMj) 
xfpiQi, implying that ~E[X] = J2i x iPi and Var(X) = J2i x iPi1i- Now ; since x 
is a sum of independent random variables, a natural approach might be to ap- 
proximate its distribution with a normal distribution with these parameters, 
that is, one could believe that 



However, this approximation often turns out to be quite poor because of the 
fact that pi is typically very small so that rather few losses occur even when the 
number of policies is large. In such a situation it is more natural to approximate 
the distribution of X with a so called compound Poisson distribution, which is 
constructed as follows: Let {Ni} be independent Poisson distributed variables 
with E[7Vj] = Xi, that is, 






Pick Aj so that P(N = 0) = (ft and put 



M ._ f if^ = 0, 
Ml \ 1 if JVi > 1 

Then P(M, = 0) = 1 - P(Mj = 1) = ^ and hence M; has the right distribution. 
Moreover, when pi is small, Mi = N with large probability. To see this, note 
that 



PiM^Ni) = P(N>2) 

= 1 - PIN, = 0) - P(Ni = 1) 
= 1 - e~ A * - A^e"* 4 
« l-a-A. + A^-Ml-A,) 
= A?/2. 

By the choice of Aj, we have 1 — p% = e~ Ai , and, since e~ Ai w 1 — Aj, it follows 
that pi rts Aj. Hence P(Mj ^ ATj) « when pi is small. In this situation it is 
natural to approximate X with S := J^a^iVj. This quantity has a compound 
Poisson distribution and, since 

P(M t ± N for some i) < ^P(M f # JV<) - O 
the approximation is good if X^Pi i s small. 

Just as the distribution of X, the distribution of S can be defined via its generat- 
ing function. Remember that the generating function for a Poisson distributed 
variable is given by 

E[e^] = V e O^- e - A * 

= exp{A i (e«-l)}. 
Since {Ni} are independent, we have 

E[e« 5 ] = JjEfe^'] 

i 

= JJexp{Ai(e^-l)} 

i 

= expj^A^-l)}. 

Introduce the notation g(£) — ^ i Aj (e^ Xi — l). We then have E[e^ s ] = e 9 ^ or, 
equivalently, g(£) = logE (e^ s ). In what follows we will derive approximations 
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Figure 1: Construction of F(dx). 



for the distribution of 5 and thereby hopefully also for the distribution of X. 
In this context it is worth noting that Mi < iVj for all i and hence X < S if 
Xi > for all i. This implies that P(X > x) < P(S > x) and thus, if we can 
find an upper bound for P(S > x), then this bound is valid also for P(X > x). 

The function g(^) can be expressed in a slightly different way using the so called 
risk mass distribution, F(dx). Let A = Y)j Xj and construct F{dx) by placing 
the mass Xi/X at the point Xi on the x-axis, i = 1, 2, 3 . . ., as demonstrated in 
Figure [TJ We then have 

/•OO 

5(0 = W (e* x - 1) F(dx) and E [e ?s ] = e 9 ^. (1) 
Jo 

More generally we can consider g(£) and S defined in this way with an arbitrary 
probability distribution F{dx) and some constant A < oo. The distribution of 
S is then called a compound Poisson distribution and the following proposition 
gives a fundamental characterization for S. 

Proposition 2.1 Let {^Oc} be independent random variables with distribution 
F(dx) and let N be a Poisson distributed variable, independent of {^fe}, with 
E[N] = X. Define S = Y,k=i X k- Then s 

has a compound Poisson distribution 

defined by (Qp. 

Proof: Let /(£) be the generating function of the distribution F(dx), that is, 

/•OO 

/(0=E[e**»] = / e^F(dx). 
Jo 

For each fixed n, the sum S n := ^2" Xk has distribution F n *(dx) - the con- 
volution of F with itself n times - with generating function E[e^ s "] = /"(£). 
Hence, P(S e dx\N = n) = F n *{dx), and, summing over the possible values of 
N, we obtain 

P{S e dx) = ^2—e~ x F n *(dx). (2) 

n=0 U ' 
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The corresponding generating function is 



E [ e?s ] = J2 E [^ S \ N = n ] p ( N = n ) 



n=0 
oo 



X™ 



n=0 

= e A(/K)-i) < 



With defined as in (HJ, we have A(/(£)-l) = .9(0 and hence E [e^ 5 ] = e^), 
as desired. □ 



2.2 The collective risk model 

In the individual risk model for a portfolio of whole life assurances, the collective 
is changed over time as more and more policyholders die. However, for mod- 
erate times and large collectives this effect can often be neglected. A natural 
approximation then is to consider a collective that is stationary in time in the 
sense that A and F(dx) are constant and the number of losses in a time interval 
of length t is Poisson distributed with expected value At, the number of losses 
in disjoint time intervals being independent. Below we give a description of the 
total loss process S(t) in the interval (0,t] motivated by this observation. 

Assume that the losses occur at time points T\, T^, ■ ■ ■ that constitute a Poisson 
process in time, that is, the increments Yk := Tf. — T/-_i are independent and 
exponentially distributed with density Ae~ Xy dy. At each time of loss Tk, an 
amount of damage Xk > is generated. The variables {Xk} are assumed to 
be independent with distribution F(dx) and the total loss in (0,i] is given by 
'■— J2r k e(o t]^ fc - ^ s illustrated in Figure [2] the process S(t) is a step 
function with jumps of height Xk at the times T^. 

To specify the distribution of S(t), let N(t) denote the number of losses in the 
interval (0, i]. We then have S(t) = J2k=i Xk- The process {N(t)} t> o is a 
Poisson process with independent increments in disjoint intervals and hence the 
increments of S(t) - that is, the sums of the amounts of loss in disjoint intervals 
- are also independent. Furthermore, since 

n\ 

by proceeding as in the derivation of @, we obtain 

00 (\t) n 

P{S{t) edx) = J2 ^-e~ Xt F n *(dx). 

This means that, just like S in the previous subsection, S(t) has a compound 
Poisson distribution and hence its generating function is given by 
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S(t) 



Figure 2: The total loss process 5(f). 



E 



E 



(At)' 



-At £71 



no 



n=0 

p At(/(£)-l) 



where, as before, /(£) is the generating function of the distribution F{dx) and 
g{d) = \S™(et*-l)F{dx). 

The above formulas define the collective risk model, which will be thoroughly 
studied in the following. The model can be used to describe both a life assur- 
ance business and a property insurance business. The total loss process {5(f)} 
has independent stationary increments with a compound Poisson distribution 
defined by A and F(dx) and the expected value and variance of 5(f) can be 
obtained by differentiating the generating function. Introducing the notation 
fi = J °° xF{dx) and v = J °° x 2 F(dx), we get 

poo 

E [5(f)] = tg'(O) = tX / xF(dx) = tXfj, 
Jo 



and 



/>OC 

Var(5(f)) = tg"(0) =tX x 2 F(dx) = tXv. 
Jo 



2.3 A method for calculating the distribution of S(t) 

Suppose that we have a fixed planning period. It is then important to be able 
to calculate P(S(t) > x) - the probability that the total loss exceeds x - as 
a function of x. In general it is not possible to find simple formulas for this 
probability. However, if the amounts of damage Xk are integer-valued - that is, 
if Xk <E {1, 2,3,...} - then the same thing holds for 5(f) and it turns out that 
we in this case can derive a recursion formula for the masses of its distribution. 
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This so called Panjer-recursion is easy to implement numerically and is widely 
used. To describe it, assume for simplicity that t — 1 and write 5(1) = 5. Also, 
let f x := P(X k =x) (x = 1, 2,3,.. .) and g y := P(S = y) (y = 0, 1,2,.. .). Here 
the probabilities {f x } are assumed to be known and we want to calculate {g y }- 
To this end, introduce the generating functions 

oo oo 

^( s ) : = Y s *f* and ^( s ) := Y sV 9v 

x—1 y—0 

Since /(£) = E [e« Xfc ] = e ?a %, we have /(f) = ¥>(e e ). Now let £ and s be 
related in that s = e s . Then /(£) = ip(s) and, since 7(5) = E [e^ s ] = e A(/ ^ )_1 ', 
we get 

Differentiating this relation we obtain rf(s) = \(p' (s)^f(s) or, more explicitly, 



7 '(s) - A^x/^' 1 ^^ 
x—i y—0 

00 00 

x—1 y—0 

00 n 

= A ^ ^ 5 n 1 Xfx9n-x- 
n—1 x—1 

But we also have 7'(s) = S^i n 5^ 5n ~ 1 - Equating these two expressions for 
Y(s) yields 

n 

ng n = ^},xf x g n - x , n= 1,2,3,... (3) 

a:=l 

The probability go is determined by noting that go = 7(0) = e^^ ^ 1 ) = e~ A , 
where the last equality follows since <p(0) — 0. Given go, the probabilities 
{gn}n>i are then successively obtained from the equations ([3]). We get 



51 = A/150 

52 = A(/i 5 i + 2/ 25o )/2 

5« = A(/ig n _i + 2/ 2 g n _ 2 + • • • + nf„g )/n. 

As described above, an important quantity is G m := P(5 > m), m > 0. Noting 
that G TO = X)m+i5«' tne probabilities {G„} can be calculated together with 
{g n } using the formula G m = G m _i — <? m , with G_i = 1. Finally we remark 
that, if the Xk'.s are not integer- valued, they can be approximated by some 
suitable discretization and the Panjer-recursion can then be applied to this 
distribution. 
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2.4 Approximations of P(S(t) > tx) 



In this section we derive two useful approximations of P(S(t) > tx). They both 
involve the generating function and are fairly easy to calculate when this 
function is known. 



2.4.1 Chernoff bound 

The first approximation is based on an inequality, Chernoff 's inequality, that is 
used in many statistical contexts. To derive it, introduce the notation F(t, dx) = 
P(S(t) € dx), fix £ > 0, and note that 



5 *fl(0 = E 



> 



HS(t) 

e ix F{t,dx) 

i 

F(t,dy) 



= e itx P(S(t) > tx). 

Consequently we have P(S(t) > tx) < e^C^-sK)) for all £ > 0. Clearly the 
best upper bound is obtained if £ > is picked so that x£ — <?(£) is maximized. 
Define 

h(x) =max{^-9(6} 
and write £ x for the maximizing £-value. We then have 

P(S(t) > tx) < e- th ^ if > 0. 
Analogously, it can be seen that 

P(S(t) < tx) < e~ th{x) if&.<0. 

The function h(x) will play an important role in what follows, and we need 
to study its properties a bit closer. To this end, first consider the function 
9(0 = A Jo°( e ^ x ~ l)F(dx). We will assume that g(£) < 00 for £ < £, where 
£ > 0, that <?(£) — > 00 as £ — > £ and also that </(£) — > 00 as £ — > £. Since 
<?'(£) = A xe^ x F(dx) and #"(£) = A / °° x 2 e^ x F(dx) are both positive, the 
derivative g'(£) increases monotonically from to 00 as £ increases from —00 to 
£. Hence <?(£) is strictly convex and increases from —A to 00 for these £-values, 
see Figure G^a). 

Now consider the function h(x). The maximizing value £ x must satisfy g'(Cx) = 
a; and, since g 1 (x) is strictly increasing and continuous, for each x this equation 
has exactly one solution £ x < £. Furthermore, the fact that </(£) is strictly 



8 



increasing also implies that £ x > if and only if x — g'(£ x ) > ff'(O) = A/x. 
Hence Chernoff 's inequalities tells us that 

(i) P(S(t) > tx) < e- th ^ if as > Am; 

(ii) P(S(t) < te) < e"*^ if x < A/x. 

A picture of the geometrical construction of the function h(x) is shown in Figure 
[3]Jb). Consider the problem of finding a tangent — h + x£, with given slope x > 0, 
to the curve <?(£). The tangent point ^ satisfies g'{£, x ) — % and h = h(x) is 
determined so that —h + x£ x — g(£ x ), that is, we have h(x) — x£ x — g(£, x ). As 
can be seen in the figure, h(x) > for all x > and h(x) = when £ x = 0. The 
geometrical construction can be thought of as if a line —h + x£, with x fixed, 
is pushed upwards towards the curve until a point is found where the line 
coincides with the tangent of the curve. This means that we are looking for 
the smallest value of h such that —h + x£ < g(£) for all £, that is, such that 
h > x£ — g(£) for all £. Hence the critical value is h(x) = maxjif — <?(£)}• 

The derivative of h(x) is 



h'(x) = — (x£ x - g(£ x )) 
ax 

= Z* + ^(x-g'(Z*)) 

ax 

Cx i 



where the last equality follows because x = g'(^ x ). The relation x = g'{£, x 

dx 

L — 1 In" It \ TToi^rr +V,^o T 

dx 



between x and £ x is f-f and differentiable. We have ^| = g"(£, x ), and, since 
g"(£) > 0, it follows that ^ = l/g"(£ x ). Using this, we get 



MX) = ^ = ?^) >0 ' 

which means that h(x) is also strictly convex. Remembering that h{x) > for 
all x, and h(Xfi) — 0, we can draw h as in Figure [3jc). 

The relation between and /i(:r) can be inverted. For £ = £ x , we have 
g(£) = a;£ — and £ = h'{x). This implies that is given by the formula 
g{0 — max x {a;£ — h(x)}, which is analogous to the formula for h(x). In the 
theory for convex functions this relation is well-known and g(£) and h(x) are 
said to be each others Legendre transforms. 

Since h{x) > for x ^ A/x, Chernoff's inequalities tells us that, when x > A/i is 
fixed, the probability of the event {S(t)/t > x} decays exponentially as t — > oo. 
Such exponential estimates are common in the theory of large deviations. Here 
"deviations" refer to deviations from the mean and "large" refers to the fact 
that the deviations are large compared to the deviations treated by the central 
limit theorem, where x = X/i+y/^/t as t — > oo. The function h(x) that measures 
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(a) 9(0 





8%) // 




J slope=x 








-h(x)-g(^)-x^^ 



(b) Construction of h(x). 

A 




(c) h(x) 

Figure 3: The functions and h{x). 
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the decay of the deviation probability is a fundamental object. It is called the 
entropy function of the distribution F{dx). Let us give some examples of how 
it is calculated for different distributions. 

1. The exponential distribution, F{dx) = e~ x dx: For £ < 1, we have 



/>oo 

9(0 = A / (e«* - l)e~ x dx 
Jo 



- 1 



AC 

This yields </(£) = A/(l — £) 2 and the equation x = g'(£) hence becomes 
1 - f = yAT^. Thus 




2. The one-point distribution F(dx) = S(x — gives = A(e^ — 1) and 
</(£) = Ae^. Putting a; = </(£), we get £ = log(a;/A) and hence 



«,) . xlog(f)-A(f-l) 

3. The Gamma distribution with fi — a, F(dx) — j a (x)dx, gives = 
A((l - C) _a - 1) and = Aa(l - C)" (a+1) . The relation a; = g'(0 

implies that 

and hence 
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2.4.2 Esscher's approximation 

The second approximation of P(S(t) > tx) is the so called Esscher-approximation, 
which is an asymptotic formula, valid as t — > oo. It states that 

P(S(t) > tx) « ^=e~ th(x) as i ^ oo 

in the sense that the quotient between the left hand side and the right hand side 
tends to 1. Here C > is a constant, and the correction factor Cj\ft gives a 
more precise estimate of the exponential decay derived in the previous section. 
We will see that in many cases this formula gives a good approximation also for 
moderate values of t and that it is easy to calculate numerically if the function 
g(£) is available. 

Since the process {£>(£)} has independent increments, it obeys the central limit 
theorem, that is, (S(t) — tXfj,) / VtXv is approximately normally distributed as 
t — > oo. This means that, with x = X^i + yy/Xu/t, we have that 

P(S(t) >tx)->l- $(y) as -> oo, 

where $(y) denotes the standard normal distribution function. The central 
limit theorem hence gives an approximation for "normal" deviations - that is, 
deviations of the form y^fXvjt - from the mean X\i. However, if we want to 
study "large" deviations, with x > Xfi fixed as t — > oo, then this approximation 
is not sufficient. Below we will see that this problem can be circumvented by 
modifying the distribution F(dx) - and thereby also the distribution of S(t) 
- so that it becomes centered at the value tx that we are interested in. The 
central limit theorem can then be applied to the transformed distribution to get 
an approximation that can be used also for the original distribution close to the 
value tx. 

The modification of the distribution F(dx) that we will use is called the Esscher- 
transform. It is obtained by introducing a distribution that is proportional to 
e ax with respect to F(dx), where a is a parameter that can be chosen freely. To 
be more precise, we embed F(dx) in an exponential family by defining 

F a (dx) = ^-F(dx), 
JW 

where /(a) = / °° e ax F(dx). For a < £ we have /(a) < oo and hence F a (dx) 
is a probability distribution. Now let F a (dx) be the modified distribution of 
{Xfc}. It defines a different distribution of S n = Yyi Xk- Write P a (-) for the 
modified probabilities and E a [-] for the corresponding means. Furthermore, let 
fa(0 '■= E a [e ?Xl ] denote the generating function of F a {dx). We then have 
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p oo 

/«(£) = / e&F.idx) 



(i 



e ax 







/(a) 



= ^ + (4) 

Since the X^'.s are independent also under the measure P a , the distribution of 
S n under this measure is given by F™*(dx) - the convolution of F a with itself 
n times. Hence E a [e^ s ™] = ^ e^ x F™*(dx). But, using (QJ, we also have 



/"(«) 

E [eS a +^ 



Thus 



/"(«) 

>o ax 

e^——F n *(dx) 
f n {a) 



FTidx) = J^)F n *(dx), 



that is, F™* is the Esscher-transform of F n *. This means that the original 
distribution of S n can be expressed in terms of the modified one via the relation 
F n *{dx) — f n (a)e~ ax F™*(dx). Hence, if we can approximate F£*(dx) for some 
choice of a, we can also approximate F n *(dx) via this relation. The reason for 
picking an exponential density for {^Gc} is that this is the only case when the 
transformed distribution of S n is obtained by applying the same transform to 
the original distribution of S n . 

Now let us make an analogous transformation of S(t). Write P a (S(t) S dx) = 
F a (t,dx) and define 

F a {t,dx) = e ax - t9 ^F(t,dx). 
Remembering that E[e^ s (*)] = e t9 ^\ we then have 

oo />oo 

F a (t,dx) = e- t9(a) / e ax F(t,dx) 
o Jo 

= 1 

so that F a (t,dx) is indeed a probability distribution. The generating function 
is given by 
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t9{a) F(t,dx) 



= E[e (o+ * )S(t)_ * 3(o) ] 
= e *(s(o+C)-fl(o))_ 



Hence we have E a [e^*'] = e t9a ^\ where g a {Q = fl(a + — g(a)- This means 
that S(t) is still a compound Poisson process, since 



9a (0 = A [°° (e^ x - e ax ) F(dx) 



Jo 



/>OC 

= A/(a) / (e*° - l) F (da;). 



./o 



We thus have the important relation that, under the measure P a , S(t) has a 
compound Poisson distribution with A a = A/ (a), jump distribution F a (dx) and 
generating function = g(a + £) — g(a). The last equation immediately gives 
us the mean and variance. We have 



Just as for S n , we have a simple expression for F(t,dx) in terms of F a (t,dx), 



We will now see how this expression can be used to study large deviations for 
S(t). Consider the probability P(S(t) > tx) with x > A/i. Center P a by choosing 
a such that E a [S'(t)] = tx, that is, such that g'(a) — x. We have previously seen 
that this equation has a strictly positive unique solution if x > X^i — g'(0). 
The central limit theorem can now be used to approximate the distribution of 
S(t) under the measure P a near its mean tx: Put S(t) = tg'(a) + Y. Then, as 
t — > oo, the distribution of Y is approximately normal with mean and variance 
a 2 = tg"(a). Furthermore, 



E a [S(t)]=tg' a (0)=tg'{a) 



and 



Var„(S(t)) = tg'^O) = tg"(a). 



namely 



F{t,dx) = e ta(a)-ax Fa 



(t, dx). 




= e* 9 ' a ^E a e ~ a(t9 ' (a ^ +Y \ Y > 
= e *(9( Q )- a 9'( a ))E a [e- aY , Y > 0] . 
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In the previous section we saw that, when x = g'(a), we have ag'(a)—g(a) = h(x) 
and hence we arrive at the fundamental formula 

P(S(t) > tx) = e- th{x) E a [e- aY ,Y > 0] . 

Now, if Y has a density, the normal approximation for Y implies that 



K(e- Y ,Y>0) « J e "°V(|)f 

POO 

= / e- a °y<p(y)dy, 
Jo 

where (p(y) — e~ y2 1 2 / \/2tt denotes the normal density. In the literature, 

POO 

E(s) = / e- sy <p(y)dy 
Jo 

POO 

Jo 



10 

= * 2 /2 



= e 
= e s2/2 (l-$(s)) 

is referred to as the Esscher function and, in terms of this function we have now 
derived Esscher's approximation formula, which states that 

P(S(t) > tx) ss e- th(x) E{aa) as i -> oo, 

where a > is determined by the relation x = g'(a) and a = yjtg"(a). The 
formula is only valid if F(dx) has a density, but later we will see that there is a 
similar approximation if F(dx) has a discrete distribution. 

As t — > oo, the same holds for s, and from the definition of E(s) we see that, 
for large s, the exponential function is quickly damped as y grows so that only 
values near y = are essential. Near y = 0, we have ip(y) w (1 — y 2 /2)/v / 27r 
and hence 



i /i i 



Thus we have the more explicit formula 

e —th(x) 

P(S(t)>tx)n x = g'(a), (5) 

\>2iray/tg"{a) 

which is also referred to as Esscher's approximation. The formula is reasonably 
easy to implement numerically provided that it is possible to compute the func- 
tion g(a) and its derivatives. If x = g'(a), h(x) = ag 1 (a) — g(a) and a = y/tg"{a) 
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are computed for sufficiently many values of a > 0, the Esscher approximations 
can also be computed and thereby we have an approximation for sufficiently 
many values of x. This method gives an approximation that is good enough for 
all distributions that occur in practice. 

Even if the condition that F(dx) has a density is not fulfilled, it is possible to 
derive an analogous approximation formula when F(dx) is a discrete distribution 
such that Xk takes values on the form nd, for some constant d and n = 0,1,2, .. .. 
To do this, note that if Xk £ {nd; n = 0,1,2,...}, the same thing holds foi- 
ls' (t). The normal approximation for Y becomes 

P(Y = y) ps ip ( — \ — for y — n ■ d. 
\aJ a 

and hence 

oo 

E[e~ Qy ,F>0] ps^V od > 

n=0 

In this case it is natural to introduce the discrete Esscher function 

oo 

E(s,b) = J2 e ~ Sn( P( nb ) b - 
The Esscher approximation then becomes 

P(S(t) > tx) w e- th ^E (ad, ^ 

As b — > we have 



n d, 
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E(s,b) « ^e- s >(0)& 

n=0 

b 



2tt(1 - e- 



, , d\ l / d 
E ad, — ps ■ t as a 



that is, 

P. I nrl 

(J J y/2^cr \l -e- ad i 
Hence, in the discrete case we have the modified Esscher approximation 

-th(x) 

P(S(t) > tx) 



2irA{d)y/W{a)' 

where A(d) = (1 — e~ ad )/d and x = g'ia). As d — > we see that A(d) —> a and 
hence the formula is consistent with (J5|). 

The Esscher approximation holds analogously for P(S(t) < tx) when x = 
g'(a) < A/i with a < 0. More generally, it holds for any probability P(S(t) £ I), 
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where / is an interval [z, y] with A/i < z < y or [y, z] with y < z < A/i. In both 
cases, a should be chosen so that g'(a) — x, where x is the point in / where h(x) 
is as small as possible, that is, the exponent is always given by mm x& jh(x). 
The general formula is 

P(S(t)/t 6J)» e-tmin^gxMx) 
\ft 

for some constant C > 0. This type of estimate is common in the more general 
theory for large deviations that has been developed during the last decades 
inspired by the pioneering work of Esscher from the 1930's. 



3 Theory of ruin probabilities 

So far we have studied the total loss S(t) without taking the flow of premiums 
in time into account, that is, we have only considered S(t) at a fixed time t. In 
this case it is relevant to study P(S(t) > tx) as we did in the previous section. 
The number tx should be thought of as the capital available at time t - that 
is, the sum of the capital at t = and the amount of premiums that is paid in 
the interval (0,t) - and we want to make sure that this capital is large enough 
to make the probability reasonably small. In such a setting we do not take the 
possibility that a deficit might arise before time t into account. 

To study the course of events in time we need to describe the flow of premiums. 
This might also be stochastic, but here we will restrict ourselves to the simplest 
setting, where the premiums constitute a constant continuous inflow so that 
the total premium paid in the interval (0, t) is ct. If the capital at time t = 
is u, the surplus at time t is then given by u + ct — S(t) (for simplicity we 
disregard income from interest). In the following we will study the so called 
ruin probability, that is, the probability that the surplus is negative at some 
time point during the planning period (0,i), where t = oo is also a possibility. 
In particular, we will see how this probability depends on the parameters u, c, 
A and F(dx). 



3.1 The total loss process 

Let us introduce the net amount of loss U{t) :— S(t) — ct. This is a stochastic 
process with upward jumps of height {^fe} at times just as S(t), and in 

between these times the process decreases at rate — c; see Figure |4] Our main 
object of interest is the time of ruin, denoted by T(u) and defined as the first 
time when U(t) > u. As we can see in Figure HJ if u > 0, the ruin occurs at 
the first time Tk such that Sk — cTk > u, that is, the ruin does not occur in 
between two loss occasions, which means that in general a non-zero deficit arises 
at time T(u). We will also study T(—u), which is the first time when U (t) < —u. 
Since the heights of the jumps are strictly positive, this occurs in between the 
jump occasions so that, unlike what holds for T(u), we have U(t) — —u at time 
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t U(t) 




it 




— it 




T(u) 



1 



Figure 4: The net loss cost process U{t). 



t = T(—u). This will turn out to be a useful fact. Now assume that u > and 
define the ruin probabilities as 



We also define T(±u) = oo if the passage to ±u never occurs, which, as we will 
see, happens with positive probability. 

Classical risk theory has to a large extent been concerned with finding equations 
for r(u) and r(u, t) and, on the basis of these equations, deriving approximations 
analogous to the ones derived in the previous section for the distribution of S(t). 
In the following we will treat these problems, using more probabilistic methods 
than the traditional ones. This often leads to a better understanding of why the 
approximations are valid and also to many simplifications of the derivations. 

Before moving on to the mathematical treatment, we remark that T(u) is of 
course the natural ruin time when we have a positive "risk sum", that is, when 
the loss amounts Xk are positive and the premium inflow has rate c > 0. This is 
the natural model for property insurance and whole life assurance. However, we 
can also apply the model to life assurance with negative risk sum. In this case we 
have a continuous outflow of payments ct and S(t) represents the accumulated 
inflow of profits made at the times of the deaths. The ruin occurs when ct — 
S(t) > u for the first time, that is, at time T{—u). Hence T{—u) also has a 
natural interpretation and r(— u, t) and r(— u) are the ruin probabilities in this 
case. 



r(u,t) 
r{u) 



P(T(u) <t), for t < oo; 
P(T(u) < oo), 



and, analogously, 



r{-u,t) 
r(—u) 



P{T(-u) < t), for t < oo; 
P{T(-u) < oo). 



18 




3.2 Basic formulas for the ruin probabilities 

We begin by deriving a clever formula for the ruin probability when u = 0. 
The formula will turn out to be useful also in finding expressions for the ruin 
probabilities when u ^ 0, as has been shown by Lajos Takacs. First consider 
the event A t := {T(0) > t} that the time to ruin exceeds t and note that 



A t = {S(t') < ct' for all t' € (0, t)} 

= {S k <cT k for fc = l, 2,..., N(t)}, 

see Figure [S] for an illustration. The following lemma gives a simple formula for 
the probability of A t given that S(t) = x, < x < ct. 



Lemma 3.1 We have 



P(A t \S(t) = x)= (1- 2 



where 



ctJ+- 

if < x < ct, 
if x > ct. 



Proof: We will use induction over N(t) = n to show the slightly stronger state- 
ment that 

P(A t \S(t) = x,N(t) = n) = (l-^) + . (6) 

To this end, first consider the case n = 0. Then S(t) = 0, so that only x = 
has to be considered, and the event A t occurs with probability 1. Hence © is 
true for n = 0. For n = 1, the event A t occurs if and only if T\ > aj/c. The 
conditional density for Ti given that iV(t) = 1 is 
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h{z)dz 



P(N(0, z) = 0, N(z, z + dz) = 1, N(z + dz, t) = 0) 



P(N(0,t) = l) 



-^Xdze'^-^ 



Xte~ xt 



dz 

T' 



< z < t, 

that is, a uniform distribution on (0,t). Hence 



p(ti>~ S(t)=x,N(t) = l^ = (l-^) + , 



and so © is true also for n = 1. 

Now assume that ([6]) holds for N(t) < n — 1 and consider the case ^(t) = x, 
N(t) = n. Given that N(t) — n, the time T n has the conditional density f n (z) 
given by 



fn(z)dz = 



P(N(0, z) = n-l, N(z, z + dz) = 1, N(z + dz, t) = 0) 

P{N(0,t) = n) 
{Xz)"-^-^ Xdze'^-^ /(n ~ 1)1 



(1) 



(Xt) n e- xt /nl 
dz 



t 



0< z<t. 



If we fix T n = z and S(T n -i) = y, where < y < x < cz < ct, it follows from 
the induction assumption that the conditional probability for A t is the same as 
for A z , that is, 1 — y/cz. Integrating over y with the conditional distribution of 
S(T n _i) given S(T n ) yields 



P(A t \S(t) = x,N(t) = n,T n = z) = E 
By symmetry we have 



1 | S(T n ) = x 



E[5(T„_i)|5(T„)] = (n-l)E[X k \S(T n )} = 



n-1 



S(T n ), 



and hence 



P(A t \S(t) = x,N(t)=n,T n =z) 



(n — l)x 



Integrating over z G (x/c,t) with the density f n {z) we finally get 
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P(A t \S(t) = x,N(t) = n) = f (l- 

Jx/c \ 



(n - l)x\ fz\ n ~ 1 dz 

t) T 



x/c 



n ( — ) fn — 1) dz 

y t) t K ' ct n 



The formula ([6]) now follows by induction. Since the right hand side does not 
involve n, the conditioning on N(t) = n can be removed without affecting the 
formula and hence the lemma is proved. □ 



Multiplying the probability in Lemma [3.11 with P(S(t) € dx) — F(t,dx) gives 
the joint probability 

P(A t ,S(t)edx) = (l-—) F(t,dx). 

\ ct/ + 

Now fix S(t) — x such that U(t) = S(t) — ct = x — ct < and write x— ct = —u. 
We then have 

P(A t \U(t) = -u)=(^) + 

and 

P(A t ,U(t) e -du) = (-) F(t,ct-du), 
\ctJ + 

that is, 

P(U(t) G -du,U(t') < for t' G (0,t)) = (— ) F(t,ct-du); (7) 

\ctJ + 

see Figure lUJa). Integrating this over x G (0,ct) we obtain the non-ruin proba- 
bility f (0, t) := 1 — r(0, t) for the initial capital u = 0, that is, 



f(0,t) = (l-^jF(t,dx). (8) 

In the following sections we will see how these formulas can be used to determine 
the ruin probabilities when u ^ 0. 



3.2.1 The distribution of T(-u) 

Let us first derive a formula for P(T(—u) G dt). A typical trajectory with 
T(-u) G dt has f/(f') > -u for t' < T(-u) and ?7(t') = -u for t' = T(-u); 
see Figure EJb) . If we turn this picture upside down and move the origin to 
the crossing point, we see that the trajectory is transformed into the trajectory 
in Figure [BJa). Hence, if this transformation does not change the distribution 
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of the process, the probability that T(—u) <E dt should be the same as the 
probability of the event in (J7J, that is, 

P(T(-u) e dt) = (—) F(t, ct - du) 
\ctJ + 

and, since —du = cdt, we have 

P(T(-u) e dt) = F(t, cdt - u) for ct > u > 0. 

This is an explicit formula for the the distribution of T(—u) and we have for 
instance that 

r(-u,t) = P(T(-u)<t)= I (—) F{s,cds-u). 

To understand that the transformed process has the same distribution as U (t) 
we can write it as U(t) = —u — U(t — t), < t < t. The process U (t) has jumps 
at the time points Tk = t — Tfe, which constitute a Poisson process, and the 
jumps are X% — Xk, which are independent with distribution F{dx). Between 
the jumps, U(t) is changed at rate — c. Hence U(t) is a process with the same 
distribution as U(t) and initial value U(0) = 0. The jumps occur in a different 
order, but this does not affect the distribution. The process U(t) is illustrated 
in Figure E{c). 

The ruin probability with t = oo is 



r(— u) 

or, with x = cs — u, 



/ ( — J F(s, cds — u), 

Ju/c V CS/ 



J \x + uj \ c 

We will mainly consider the case when c > A/i so that E[C/ (t)] = — (c— A/.t)i < 0. 
By the law of large numbers, 

U(t) , , . „ 

— !• — (c — A/i) < a.s. as t — >• oo. 

This implies that, with probability 1, the barrier — u is hit sooner or later, that 
is, r(—u) = 1, or, equivalently, 

u \ ( x + u \ 
^x—) F {— dX )= 1 ifC> ^' (9) 
This relation will prove to be important in what follows. 
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3.2.2 The distribution of T(u) 

We will now derive an explicit formula for r(u,t) — P(T(u) < t) by using the 
previous results and conditioning on the value of U (t) . Trivially 

r(u, t) = P{T{u) < t, U{t) >u) + P{T{u) < t, U(t) < u). 

If U(t) > u, we know for sure that T(u) < t, and hence 



P(T(u) < t, U(t) > u) = P(U(t) > u) 




F(t,dx). 

—u+ct 



If T(u) < t and U(t) < u the trajectory for U(t) has to cross the level u one or 
more times between T{u) and t; see Figure [3 Let s be the value of the last time 
when this occurs. The probability for such an outcome is P(U(s) £ du)P(E), 
where E denotes the event to go from u at s to u — dy at t without exceeding u 
between s and t. The first factor equals F(s, du+cs) = F(s, u+cds). By Lemma 
13.11 the last factor equals (y/c(t — s))F(t — s, c(t — s) — dy) and, integrating over 
y > 0, we get f(0, t — s), see (JSJ). Combining all this yields 

r(u,t)= / F(t,dx)+ / F(s,u + cds)r(0,t- s). 

Ju+ct JO 

This formula is called Seals's formula and, if F(t,dx) is known, it can be used 
to calculate r(u,t). As u and t becomes large, it can also be used to derive 
an asymptotic formula using the Esscher- approximation of F(t,dx), but the 
calculations become cumbersome. 



24 



U(t) 


u 2 


u 3 














^\ t 



Figure 8: The process U(t) divided in up-crossings {Uk}- 



3.2.3 The ruin probability r(u) 

We will now derive a useful formula for r(u) — P(T(u) < oo). A conceivable 
method for studying r(u) would be to let t —¥ oo in Seal's formula for r(u,t). 
However, we will see that it is possible to obtain an interesting formula via a 
more direct analysis, where the process U(t) is divided into successive upcross- 
ings, {Uk}; see Figured] These upcrossings are defined as follows: Initially, 
£7(0) = 0. With probability r := r(0), we have U{t) > for some t, and 
with probability 1 — r, we have U(t) < for all t. In the first case, define U\ 
to be the value of U(t) just after it has exceeded for the first time, that is, 
Ui = U(T(0)) if T(0) < oo. From this point U(t) goes on for t > T(0), and the 
process U(T(0) + 1) — U\ has the same distribution as U(t) and is independent 
of U\. Define C/ 2 as the first up-crossing in this process, and so on. In each step, 
there is a probability 1 — r that no more up-crossing occurs, and the successive 
Uk'S become independent and identically distributed. 

Below we will see that the Uk'S have a density k(u) that is easy to write down 
and that 

r= { % ifc>A M , 
\ 1 if c < Xfi. 

Let M denote the number of upcrossings and define U = Uk- When r < 1, 
M has a geometric distribution with 

P(M = m) = (1 - r)r m , m = 0,1,2, ... . 

This means that M is finite with probability one and hence we can write U = 
max t >o C/(t). The ruin probability then becomes r(u) = P(U > u) and this 
probability can easily be expressed in terms of r and k(u). It turns out, namely, 
that U has a compound geometrical density, 

oo 

l(u) = (1-r) r m k m *{u), (10) 

m=Q 
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where k m *{u) denotes the convolution of k(u) with itself m times (compare with 
the compound Poisson distribution characterized in Proposition 12. ip . This can 
be seen by noting that, with probability (1 — r)r m , we have M = m and the 
density of U then becomes k m *(u). Summing over the possible values of m, we 
get (fPT)|) . The formula for r{u) becomes 

/■OO 

r ( u ) = / Ky) d y, u>o. 

This formula is useful, since both k(u) and r can easily be calculated. To find 
expressions for k(u) and r, consider the first upcrossing U := U\. Write —V for 
the value of U(t) just before this up-crossing and let W denote the time when 
the up-crossing occurs; see Figure |H1 By Lemma 13.11 the joint distribution of 
(U, V, W) is 

P(U edu,V e dv, W Edw) = P(U(w) e -dv and U{t) < for t < w) ■ 

P(a loss occurs in dw) ■ 
P(the amount of loss £ v + du) 

= ( — J F(w,cw — dv)XdwF(v + du). 

The distribution of (U, V) is obtained by integrating over w. Assuming that F 
has a density F' and making the substitution x — cw — v, we get 

P{U£du,V£dv) = / (— ) F'(w, cw- v)XdwdvF' (u + v)du 

Jw>v/c K CW/ 

u , f°° ( V \ . fx + v \ dx 

= XF'(u + v)dudv F ,x) — 

J \x + vj \ c ) c 

= —F'(u + v)dudv when c > A/i, 
c 
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where the last equality follows from Hence the pair (U,V) has density 

(X/c)F'(u + v)dudv for u, v > 0. Integrating this over v gives 



A r°° 

P(Uedu) = -du F'(u + v)dv 
c Jo 

= ~(1-F(u))du. 

c 



Since 



/ (1 - F(u))du = / uF'(u)du = n, 
Jo Jo 

we normalize by fi, to get 

— • ^-^-du — rk{u)du, 

c \i 

where r — X/i/c < 1 and k(u) = (1 — F(u))/[i, with k(u)du = 1. 

To summarize, we see that P(U > 0) = r = Xfi/c, and that the conditional 
density of U given that U > 0, is given by k(u) = (1 — F(u))/fi. Also, the ruin 
probability is 

/>oo 

r(w) = / l(y)dy, u>0, 

J u 

where l(u) is specified in (|10l) . In risk theory, this formula is called Cramer's 
formula, and in queuing theory - where it solves a similar problem - it is referred 
to as Pollaczek-Khinchin's formula. 

The formula for l(u) gives rise to a corresponding equation for the generating 
functions 

/•oo 

K(f) = / e^ u k(u)du (11) 

and 

A(£) = / e iu l{u)du 7 (12) 



(i 



which are defined at least for £ < 0. The equation for /(it) corresponds to the 
equation 



A(0 = (l-r)5>"V"(0 

1 - r 
1 - r/c(£) ' 

This equation can be used to compute l(u) by inverting the generating function. 
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Example. Let k(u) = e u . Then 

/•CO 

Jo 



which implies that 



HO = 



1-r 



1 - r 



l-r/(l-0 

r(l — r) 



= (l-r) + 



l-r-£ 



This is the generating function of i(it) = (1 — r)S(u) + r(l — r)e ^ The 
density can be computed in a similar way when is a rational function 

ofe ' 

3.2.4 Panjer-approximation of r(u) 

In the following two sections, two different approximations of r(u) will be de- 
rived. The first one is a kind of Panjer-recursion and the second one is an 
asymptotic formula as u — > oo similar to the Esscher approximation. 

As for the Panjer-recursion, consider first two discrete distributions {k n }f and 
{^n}o°> where {k n } is known and {l n } is "compound geometric", that is, 

oo 

l n = (l-r)J2r m kr, n = 0,l,2,.... 

m=0 



We will now see that {l n } can be computed by aid of a recursive formula of 

oo 




Panjer-type. Convolving the equation I = (1 — r) r m fc m * with rfc yields 



rk * I = (1 - r) r m+1 fc(" l+1 > 



m=0 



(1 - r) ]T r m fc" 



m— 1 



- J-(l-r)J, 



where 



1 if n = 0, 
if n > 0. 
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Hence we have the renewal equation I = (1 — r)5 + rk * I, or, more explicitly, 

n 

l n (1 T^5 n T ^ ^ k m l n — m . 

m—1 

The probabilities {l n } can successively be computed for n = 0,1,2, We 

obtain 

h = l-r 
h = rkil Q 
h = r(k 1 l 1 + k 2 l ) 

L = r(fciZ„_i + . . . + k n l Q ). 

If {k n } is known, this recursion is easy to implement. Also, the probabilities 
r n = lm can be computed parallel to l n . 

The equations for l{u) and r(u) are similar, but, just as k(u), they are the den- 
sities of continuous random variables. However, if we make a suitable discrete 
approximation of fc(u), we can calculate the corresponding approximations of 
l(u) and r{u) by the above method. 

The relation between {k n } and {/„} can also be expressed in terms of the gen- 
erating functions k(s) = k n s n and l(s) = l n s n . We have 

oo 

t(s) = (1 -r) J2 r m k m (s) 

l-r 
1 — rk(s) 

If, for instance, k(s) is a rational function of s, the function l(s) is also rational 
and {l n } can be obtained by partial fraction expansion. 

Example. Let k n = (1 — p)p n ~ 1 , n = 1, 2, This gives 

oo 

k(a) = (l-^^T/V"- 1 

n=l 

1 — ps 

that is, 
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1 i r \ 1 r(l-p)s 

1 — rk(s) = 1 

1 - ps 

1 — qs 
1 — ps 

where q = p + r(l — p). Hence 



= d-r)(l + 4 
- (l-r)^l + r(l 

This yields 

/ io = (l-r) 
\ J n = (l-r)r(l-p)g n - 1 , n > 1. 

A natural method for finding a discrete approximation to the density k(x) can 
be obtained as follows: Approximate first the distribution F(x) by a discrete 
distribution with masses /„ for x = nd and n = 1,2,... and put F n — J2i fm- 
For this distribution F(x) is piecewise constant: F(x) — F n and 1 — F(x) — 
1 - F n = Y^+i fm for nd < x < (n + l)d, and then p = rfX^o°( 1 _ F ™)- The 
density k(x) = (l — F(x))/fi can then be approximated by a discrete distribution 
having masses k„ = J" n _ 1 - jd k{x)dx = — F„_i)//z = {d/ u) fm for x = nd 

and n = 1 , This distribution will have total mass one and is located at 

positive x-values. 




3.2.5 Cramer-Lundberg's approximation of r(u) 

We will now derive a more explicit approximation formula for r(u). It is an 
asymptotic formula valid as u — > oo and, as we will see, it is closely related to 
the Esscher approximation. 

First recall from Section 3.2.3 that 

/•OO 

r(u) — / l{y)dy for u > 0, 

J u 

where l(y) = (1 - r) r m k m *(y), r = Xp/c and k{u) = (1 - F{u))/p. Here 
r = P(U > 0), where U denotes the size of an up-crossing, and k(u) is the 
conditional density of U given that U > 0. To get an approximation of l(y) 
when y is large, we need an approximation of k m *(y) as y — > oo. Since r m 
damps large m- values in the formula for l(y) we only have to consider moderate 
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values of to. The desired approximation is obtained by introducing a modified 
density k a (y) as in Section 2.4.2, and choosing a suitably. We have 

e n »fc(y) 

ka(y) = 

K(a) 

where K is the generating function of the density k (see (fTTjl h and below we will 
see that, just as 17(a), k(cl) < oo for a < £. As in Section 2.4.2, we get 

k m* = e ay k™(y) 
K m (a) 

so that 

k m *{y) =e- av n m (a)k™*(y) 

and hence 

oo 

l(y) = e- a y(\-r)Y j r m K m {a)kr(y). 
Choosing a such that rn(a) — 1 yields 

oo 

l(y) = c-»(l - r) £ C* (!/)• 
m=0 

As j/ — >• oo, this expression can be approximated using the so called renewal 
theorem, which is an important result in renewal theory. It states that, as 
y oo, the sum X)o° Kx*(v) can be approximated by a uniform density with 
intensity l/m a , where 



m a = I yk a (y)dy 
"°° ye a vk(y) 



«(<x) 
k'{o) 



dy 



n{a) 

Substituting this approximation in the formula for r(u) gives 



r(u) 



POO 

/ e - ay {l- 

J u 


\ dy 

r) 

m a 


c -au( l ~r) 


p OO 

/ 6 ~ 
'o 


0-- r ) e -au 




am a 





If a is known, this is a simple exponential approximation. 
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Figure 10: Construction of R. 



The equation for a, rn(a) = 1, can be expressed more explicitly in terms of 5(a), 
by noting that 



where the second equality is obtained by partial integration. The equation for a 
hence becomes <?(a)/aA/x = 1/r = c/A/x, that is, g(a) — ca. Recall from Section 
2.4.1 that g(£) is strictly convex with g(0) = and g'(0) — We are looking- 
for the intersection with a line c£ with slope c; see Figure HU1 For c > g'{0) = A/i, 
there is a strictly positive root, which is denoted by R and referred to as the 
Lundberg exponent. For c < A/x, the root is negative. 

To find an expression for the constant C := (1 — r)/ am a in the formula for r{u) , 
note that 




«M Jo 

9(a) 



m, 



a 



k(o) 

pog«(a)]' 
[log<?(a)]'-[loga]' 
9'(a) 1 



g{a) a 
g'(a) - c 



ca 



This yields 
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c-fl'(0) _ 
g'(a) -c 

To sum up, we have deduced that r(u) s=y Ce~ Ru , where R is the positive root 
of the equation g(a) = ca and C = (c — g' (0)) / (g' (R) — c). Here means 
that the quotient between the right hand and the left hand side tends to 1 as 
11 — V oo . A natural way of using these formulas for the design of a system is 
to start by choosing c so that r has a suitable value close enough to one, and 
then finding the corresponding values of R and C. Then u can easily be found 
so that Ce~ Ru has a value considered to be small enough to be safe. 
Example. Approximate calculation of R and C when r = Xp/c is close to one. 
The equation g{R)/R — c can be expressed in terms of the Taylor expansion of 
g(R) as follows: 

OO 

g(R) = \J2»kR k /k\ 
fe=i 

where pk is the k-th moment of the claims distribution F. (^i = and fi 2 =v). 
In terms of it the equation for R is hence 

A(a» + M2#/2 + M3^ 2 /6 +•••) = c 

or 

A( M2 i?/2 + M3# 2 /6 + •••)= AMVr - 1), 

and we see that r w 1 corresponds top=(l/r — 1)«0 and hence to R rts 0. To 
first order in p we hence have /i2-Ri/2 = nip and R\ = \i-i)p. To second 
order in p we then have P2R2/2 + /U3-R1/6 = y(iip and 

R 2 = Ri- (2/ M2 )( M3 /6)(2 Ml / M2 )V = (2 m / M2 )p- (4/3)( M3M 2 / M 3) p 2 

etc. The corresponding values of C can be obtained from the relation 

C = (g{R)/R-\p)/{g'{R)-g{R)/R) 

OO OO 

= C£^R k - 1 /m/{Y,^R k -\k-l)/k\) 

k=2 k=2 
00 00 

= (J2^R k - 2 /k\)/(J2^R k - 2 (k-l)/k\) 

k=2 k=2 

(/z 2 /2) + (M 3 /6)i? + (p4/24)R 2 + ■■■ 
(p2/2) + (fi 3 /3)R+(p4/8)R 2 + --- 

To first order in p we hence have d = ((/i 2 /2) + (p 3 /6)Ri)/ ((p 2 /2) + (p 3 /3)Ri), 
and we get quite explicit expressions in terms of the moments pt and p. 
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Example. Assume that F'(x) is a weighted sum of exponential densities, that 
is, 

71 

F'(x)=J2 a * b * e ~ b ' X > 
i=i 

where <Zj > 0, a « ~ 1 anc ^ < b\ < 62 < ■ • • < b n . Then r(u) can be 
calculated fairly explicitly via the generating functions and A(£), defined 
in ([lip and (|12p respectively, and we will be able to see how the approximation 
Ce~ Rfl arises. First recall that /c(f) = #(£)/£AjU and A(£) = (1 - r)/(l - r/c(£)). 
The generating function, /(£), of F'(x) is 

i=l 0l 5 

and we obtain 



3(0 = A(/(0-l) 

i=l ° l 41 



and = ^™ cii/bi. Hence 

= 77 



that is, k(£) is a rational function of £, where the denominator is of degree 
n, and k(£) -> as |£| — > 00. The poles of A(£) - that is, the zeroes of its 
denominator - are the roots of the equation 1 — r«(£) = 0. If the root £ = 
is ignored, this equation can be rewritten as 1 = rg(£)/\fi£, that is, g(£) = c£. 
Using the relation (fl~3|) . the equation becomes 

n 



For £ = 0, the left hand side equals A/i which is strictly smaller than c. A graph 
of the expression on the left hand side as a function of £ > is displayed in 
Figure [TT] We see that there are n real roots R\, . . . R n , with < R\ < b\ < 
i?2 < &2 < • • • < Rn < b n . Hence, the partial fraction expansion of A(£) is 

A(o = a-r)+f; CiRi 



Ri-V 



i=l 

where the coefficients CiRi are determined by the formula 
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Figure 11: Graphical picture of the roots {Ri}. 



dRi = lim {Ri - OHO 

= lim (^-O^-O 
^R t l-rg(0/^ 

Near £ = R4, we have for the denominator, that 

<7(0 -c£ « .g(i? J )-ci? J + (e-i? 4 )(.9'(^)-c) 

= k - Ri)(g'(Ri) - c), 

since = ci?». Hence Cj = c(l — r)/(g'(Ri) — c) and, using the formula for 

A(0, we obtain 

n 

l(x) = (1 - r)S(x) + CiRie- RiX 
»=i 

and 

/■OO 

r(u) = / l(x)dx 

J u 

n 

»=i 



35 



This is en elegant generalization of Cramer-Lundberg's formula, which is ob- 
tained when only the contribution from R\ = R is included. Since Ri < b\ < 
i?2 < ■ • • < Rn, we see that the first term Ce~ Ru dominates, as expected. 

3.2.6 An alternative derivation of Cramer's formula for r(u) 

In the derivation of the formula for r(u), the process U(t) was divided into 
successive upcrossings. This gave a natural probabilistic interpretation of the 
quantities r and k(y). The traditional method for determining r(u) is to derive 
an integral equation, that is well-known in renewal theory, and to show that its 
solution is given by Cramer's formula. Although it does not provide the same 
insight concerning the probabilistic structure of the solution, this method has 
the advantage of being more direct. Also, it can be generalized to the case when 
c depends on the value of U{t), which is indeed a natural extension. For the 
sake of completeness, we describe also this analytic derivation. 

We are looking for an equation for r(u) as a function of u, that is based on an 
analysis of what can happen in a small interval (0,h) just after t = 0. Such 
equations are common in the more general theory for Markov processes and are 
referred to as backward equations. There are basically two possible scenarios 
that can occur in the interval (0, h): 

1. With probability e~ xh w 1 — Xh no loss occurs. At time h we then have 
U(h) = —ch and ruin has not yet occurred. Looking ahead from h, the 
ruin probability is r(u + ch), since the surplus has increased by ch in the 
interval (0, h). 

2. With probability w Xh a loss occurs in (0, h). Let x denote the amount of 
loss. If x > u, ruin occurs immediately. If x < u, we have U(h) « x and 
ruin has not yet occurred. Looking ahead from h, the ruin probability is 
r(u — x), since the surplus has decreased by x in the interval (0, h). 

(3.) With probability o(/i 2 ), more than one loss occur in (0, h). As h — > 0, this 
possibility can be excluded. 

Combining this gives 



r(u) = (1 - Xh)r(u + ch) + Xh r(u - x)F(dx) + Xh F(dx) + o(h 2 ) 



where F(u) = 1 — F(u). We will solve this equation with the boundary condition 
that r(u) — >0asu^ooifc> A/i. A difficulty is that both r(u) and r'(u) are 
included in the equation. However, it turns out that r(u) can be eliminated by 





as h — > 0. If we assume that r'(u) exists, we get the equation 



cr'(u) - Xr(u) + X r(u- x)F(dx) + XF(u) = 0, 
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partial integration of the third term. Using the fact that dF(x) = —F(dx), we 
get 

/ r(u - x)F(dx) = r(u) - r(0)F(u) - / r'(u - x)F{x)dx. 
Jo Jo 

Substituting this in the above equation yields 

ru 

cr'(u) = A / r'(u - x)F{x)dx + A(r(0) - l)F(u), 
Jo 

Here r(0) is a constant that can be determined from the boundary condition. 
This is an equation that involves only r'(u). To solve it, introduce r = A/z/c 
and k(u) = (1 — F(uj)/^i = F(u)/^i. The equation then becomes 

r'(u) = r I r'(u — x)k(x)dx + r(r(0) — l)fc(u). 

This is a renewal equation that, by a convolution operation, can be written as 

r'(u) = r(k * r')(u) + r(r(0) - l)fc(w). 

This equation can be solved by an iteration that converges when r < 1, that is, 
when c> Xfi. We have 



r'(u) = r(r(0) - 1) ( ^ r m k m * * k(u) 

\m=0 J 

oo 

= (r(0) - 1) r m k m *(u). 

m—l 

According to the boundary condition, r'{u) = r(oo) — r(0) = — r(0), and, 
since J °° k m *{u)du = 1 for all to, we have 



/ J2 r m k m *(u)du = J2 rT ' 



1-r 



Using these relations, we obtain — r(0) = (r(0) — 1) • r/(l — r), that is, r(0) = r. 
Hence, since l{u) = (1 - r) J]^ r m /c m *(u) and fc°*(u) = (5(u) = when u > 0, 
we get 



r\u) = -(1 -r) J2 r m k m *(u) 

m—l 

= —l(u) for u > 0, 

By integration it follows that r(u) = l(y)dy. This is the same formula that 
was derived in 3.2.3. 
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3.2.7 Approximation of r(u,t) 

So far we have been concerned with r(u) = P(T(u) < oo). However, it is also 
interesting to study when ruin occurs if T(u) < oo. In the following we will 
prove a law of large numbers for T(u), that states that, if T(u) < oo, then 
with large probability T(u) « ut as t — > oo, where i is given by a formula that 
includes the Lundberg exponent R. We will show that exponential inequalities, 
analogous to the ones for P(S(t) > tx), hold for P{T(u) < ut) when t < i and 
for P(ut < T{u) < oo) when t > t. The exponent can be expressed in terms of 
the function h(x). 

In the derivation of Chernoff 's inequality, P(S(t) > tx) < e~ th ^ for x > X/j,, in 
Section 2.4.1, we started with the relation E[e^ s ^)] = e tg ^ and picked a suit- 
able value of £, depending on x. This relation can be written as E \e^ s ^~ t9 ^] = 
1 for all t. We will first show that, for some ^-values, this relation holds also for 
the stochastic times T(u) and T(— u) so that, for instance, 

E US(T(U))-T(u)g(0 ;T(u) < 1 = L (w) 

Starting from this equation, which is called Wald's identity, we will derive in- 
equalities for T(u) analogous to the Chernoff bounds. 

Proof of Wald's identity: 

Since the process S(t) has independent increments, for < s < t, we have that 
S(t) — S(s) is independent of all events A s and stochastic variables that concern 
the values of the process up to time s. This implies that 



E 



= E 
= E 
= E 



e t(S(t)-S(s))-(t-s)g(t) . e £S(s)- S g(£) ^ ^ 



J(S(t)-S(s))-(t-s)g(t) 



■ E 



,tS(s)-8g(t) A ^ 



where the last equality comes from the fact that S(t) — S(s) has the same 
distribution as S(t — s) and so its generating function is (t — s)g(£). Now let 
A s = {T(u) e (s — ds, s}}. Note that when the values of the process up to time 
s are given, we can decide if A s has occurred or not. We get 



E 



e «s«-t»(0 T ( u ) eds 
e ^(r (u ))-r( u ) ff (£) }T(u)edfl 



In the last equality we have used the facts that, since S(t) is right-continuous at 
the jump points, the difference between S(T(u)) and S(s) is at most cds when 
s — T(u) < ds, and that, with large probability, at most one jump occurs in ds. 
Integrating the above relation for s € (0, t] yields 



E\e^- t9 ^,T(u)<t\ =E[e« s < T <" 



i))-T(u)s(£) 



T(u) < t 
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Recalling the definition of the Esscher-transformed distribution of S(t) in Sec- 
tion 2.4.2, we see that the left hand side can be written as P^(T(u) < t), where 
Pg(-) is the transformed measure. To establish Wald's identity we have to show 
that this tends to 1 as t —> oo. To this end, remember that S(t) is still a 
compound Poisson process under the measure but the mean is changed to 
E € [S(i)] = tg'(£). Hence E ( [U(t)} = t(g'(£) - c). If £ is chosen so that this is 
strictly positive - that is, so that g'(£) > c then, by the law of large numbers, 
U{t) — > oo with P^-probability 1 and it follows that P^(T(u) < oo) = 1. Since 
g'(£) is an increasing function of £, the condition that <?'(£) > c is fulfilled for 
£ > £ c , where £ c satisfies g'(£ c ) — c - 

To summarize, we have showed that (|14p holds for £ > £ c , where £ c is defined 
via the relation <?'(£ c ) = c. Analogously it can be shown for T(— w) that 



E e «S(n-«))-T(-«)g(£) 



,T(- 



■u) < oo 



= 1 



if £ < £ C i since then the drift is strictly negative and P^(T(—u) < oo) = 1. □ 

From the picture of the definition of the Lundberg exponent R in Figure [TU] it 
can be seen that < £ c < R if c > A/i and R < £ c < if c < A/i. We will now 
see how Wald's identity can be used to study T(u) for c > A/i. First remember 
that T(u) is defined as the first time when U{t) > u, that is, the first time when 
S(t) > u + ct. Together with Wald's identity this yields that, for £ > £ c > 0, 



1 > E 



s e(«+cr(tt))-s(e)T(«) ;T ( u ) < 



00 



that is, 



e (c€-9(0)T(«) )T(u)< 



TO 



e-& > E 
In particular, for £ = R we get 

P{T{u) < oo) = r(u) < e- R " 



which is referred to as Lundberg 's inequality. This inequality holds for all u > 
and the exponent is the same as in the asymptotic approximation in Section 
3.2.5. 

The above inequality can be used to estimate P(T(u) < ut) (compare with the 
Chernoff bound from Section 2.4.1). If c£ — < 0, when T(u) < ut, we have 
(c£ - g(£))T(u) > (c£ - g(g))ut so that 



-«« > e 



3 (c«-s(«))n«) )T(u )< U f 



> e ( c €-9(0)« i p(T( U ) < ut), 



and hence 



39 



P(T(u) < ut) < e -<-«*( c ?-9(«)) 

= e -ut((c+l/t)S-g(S))_ 

To get the best possible estimate we want to minimize the exponent. This is 
done by picking £ such that <?'(£) = c+l/t and, recalling the definition of h(x) 
from Section 2.4.1, we get 

P(T(u) < ut) < e -««»(c+V*). 

The above calculations are valid under the assumption that c£— < 0, that is, 
R < £, where £ is defined by </(£) = c+l/t. Since </(£) is strictly increasing, the 
condition that £ > i? is equivalent to <?'(£) > g'(R), that is, to 1/t > g'(-R) — c. 
If we introduce t, defined by the relation 1/t = g'(R) — c, we see that the above 
estimate holds for t < t . 

Analogously, if we pick £ such that £ c < £ and c£ — > we can estimate 
P(uf < T(u) < oo). We get 

P(ui < T(«) < oo) < e -«*((c+i/t)C-9(0). 

If 5'(0 = c + 1/t and £ c < £ < P, that is, if c = #'(£ c ) < c + 1/t < g'(R), or, 
cquivalently, < 1/t < that is, t > i , then it follows that 

P(ut < T(u) < oo) < e -^h(c+i/t)_ 

For £ = i? we get t = t and the exponent then becomes th(c + 1/t) = t((c + 
l/t)P - g(R)) = R, since cR = g(R). 

To summarize, we have shown that there is a time i, defined by the relation 
1/t = g'(R) — c, such that the deviations from ut can be estimated by 

P(T(u) < ut) < e -«tfc(c+V*) for t < t, 

and 

P(ut < T(u) < oo) < e - uth (c+i/t) for t > ^ 
and tti(c + 1/t) = P for t = t. 

We will soon see that the exponent H(t) := th(c + 1/t) is a strictly convex 
function of t with min t H(t) = Hit) = R. This fact makes it possible to 
study T(u) when T(u) < oo. Assume for example that t < i. By Cramer's 
approximation, as u — > oo we then have 



P{T(u) < ut\T(u) < oo) = 



< 



P(T(u) < ut) 
r(u) 

e -uH(i) 

r(u) 

e -u(H{t)-R) 
C ' 
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This tends to exponentially fast, since H (t) > R for t < t . Analogously it can 
be seen that P(ut < T(u) < oo|T(w) < oo) tends to exponentially fast when 
t > i and u — > oo. This has the following important interpretation: When t > i, 
the ruin probability r{u,ut) can be approximated by r(u) ss Ce~ Ru , and when 
t < t, we have r(u,ut) « r(u), since r(u,ut) < e~ uH ^' and r(u) w Ce~ Ru 
with > 

Proof of the convexity of H(t): 

We have H(t) = t((c + - g(0) = M - 9(0) + £, with g'(£) = c + 1/t. 
When t varies, £ varies as well, and we get 



dH = (c4-g(0)dt + (tc-tg'(0 + m 
= (ct-g(O)dt. 



Hence H'(t) =d§=c4-g^) and 



H"(t) = (c -</(£))§ 

1 d$ 
t ' ~dt 

From the equation for £ it follows that —dt/t 2 = g"(£,)d£, that is, 

d£ 1 



d* *V(0 



< 0. 



Thus H"(t) > and we have showed that H(t) is strictly convex. The function 
H{t) attains its smallest value when H'(t) = c£ — g(£) = 0, that is, when £ = R 
and t — i. As described above, we then have H(t) — i(cR — g{Rj) + R = R. □ 



3.2.8 Approximation of r(— u, t) 

We will now show that the above estimates of r(u,t) also hold for r(— u, t) = 
P(T(—u) < t) with small modifications when c < A/i. As we have seen, in this 
case we have R < £ c < and it follows from Wald's identity that 



E 



e WT(-„))-T(-u) fl «) jT( _ u)<00 - 



= 1 



for £ < £ c . An interesting difference as compared to the previous case is that at 
the time of ruin we now have S(T(— u)) = —u + cT(—u). This means that 



E 



e -*+(<*-fl(0)T(-«) )T (_ u ) <00 



1. 



This is an equation for the generating function of T(—u): Put w = c£ — for 
£ < £ c . Since ^ = c— <?'(£) > so that £ := is uniquely determined, this 

is a 1-1 relation. Hence 



E 



wT{-u) 



,T(-u) < oo 
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For w = we have R(0) = R < which gives the exact relation 



P(T(-u) < oo) = e 



Tin 



e -\R\u 



where P(T(—u) < oo) = r(— u). 

As before we can also estimate P{T(—u) < ut). If c£ — g(£) < we have 



e ui > E 



e «-9(0)T(-t.) >T( _ u)<ut 



> e ( c ?-f(«))« t p(T(-u) < ut) 



so that 



P(T(-u) < ut) < e - ut(c «- 9( « ))+ "« 

-uth(c-lft) 



e - ut ((c-l/t)Z-g(0) 



if £ is chosen such that <?'(£) = c — 1/t. This is possible if g(£) > c£, that is, if 
£ < R < so that g'(£) = c-l/t< g'(R). Putting l/t = c- g'(R) gives the 
condition 1/t < 1/t, that is, t < i . Analogously we obtain 

P(ut < T(-u) < oo) < e -vth(c~i/t) for t > f _ 

To summarize, we have the formulas 1/t = c — g'(R), H(t) = th(c —1/t) and 
r(—u) — e - '^". Furthermore, 



P(T(-u) < ut\T{-u) < oo) = r( "'"^ 

r(— u) 

< e -n(H(t)+R) fort < f) 

and 

^/ m/ % ,_, x . r(— u) — r(— u, ut) 

P(ut < T(-u) < oo T -m < oo = -i ' \ — - 

r(— u) 

< e -«W*)+«) fort>t. 

Since if (£) is strictly convex with -ff (t) > f?(t) = — R > 0, we can hence localize 
T(—u) well near ut. 
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3.2.9 An interpretation of the modified distribution P R (S(t) € dx) 
The Esscher transformed distribution P R {S(t) e dx) is denned by 

P R (S(t) e dx) = e Rx - tg ^F(t,dx) 

and we have seen that 

= e t(g{t+R)-g(R)) _ 

Under this measure, we have E R [S(t)} = tg'(R) and E R [U(t)} = t(g'(R) - c) > 
when c > A/x. Hence, by the law of large numbers, P R (T(u) < oo) = 1. Also, 
by the same theorem, since E R [U(ut)] = ui(g'(R) — c) = u, we should expect 
that T(u) s=y ut under the measure P R as u — > oo. As we have just seen, given 
that T(u) < oo, we have that T(u) ~ ut as u — >• oo. Hence, it seems as if the 
measure P R gives an approximate description of the conditional distribution of 
the process S(t) given that T(u) < oo as u — > oo. It is not hard to see that this 
is true for fixed t: Since P(i < T(u) < co\T{u) < oo) — > 1 as u — > oo, we have 
the relation 



E 



, x x P(S(t) e dx, T(u) < oo) 

P(S(t) € dx,t < T{u) < oo) 
r(u) 

Because of the Markov property, this probability equals 

P(S(t) e cfe)P(i < T(w) < oo|5(i) = x) P(S(t) € dx)r{u -x + ct) 

r(u) r(u) 

F(t, dx)r(u - x + ct) 
r(u) 

since, if S(t) = x we have U(t) = x — ct and so the surplus at time t is u — x + ct. 
As u — > oo with i fixed, we have r(u) s=s Ce~ Ru , implying that 

r(u - x + ct) ^ efla; _ flct 
r(u) 

Hence the conditional distribution of S(t) converges to F(t,dx)e Rx ~ Rct and, 
since g(R) — Rc, 



F(t,dx)e Rx ~ Rct = F{t 1 dx)e Rx - t9 ^ 
= P R (S(t)edx). 
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A corresponding result holds for T(— u) when c < A/x. 

Using the distribution Pr a fairly intuitive proof of the central limit theorem 
for the quantity (T(u) — u£)/y/u can be formulated as follows: If we invert the 
relation between P and Pr we see that 

e -RS(T(u)) + T(u )g (R)^ T{u) < ui+t ^ . 

Furthermore, U(T(u)) = S(T(u)) — cT(u) = u + Z, where Z is the overshoot 
over u at the passage at T(u). The overshoot Z is bounded when u is large and 
approximately independent of T{u). Hence, because g(R) — cR the exponent 
in this expression can be written —R(cT(u) + u + Z) + cRT(u) = —Ru — RZ, 
so that 

P(T(u) <ut + t\/u) w e~ Ru E R [e~ RZ ] Pr(T(u) < ut + ty/u). 

The last probability can be estimated using the fact that, under the modified 
measure Pr, the process U(t) = S(t) — ct has positive drift E R [U(t)] = t(g'(R) — 
c) = t/t and variance Var£/(t) = Vax(S(t)) = tg"(R) = to 2 . We can now 
estimate T(u) as follows. The law of large numbers tells us that U(t)/t — > 1/i 
when t —¥ oo and, sice T(u) — > oo as u — > oo, we have U(T(u))/T(u) 1/i 
as u — > oo. But, since U(T(u)) = u + Z with Z bounded, this implies that 
u/T(u) — > 1/i, that is, T(u)/u — > i. We can now use the central limit theorem 
for 17 (i) , which tells us that the quantity 

x = m-t/t 

G\Tt 

has an approximatively iV(0, 1) distribution when t — > oo. Using this for t = 
T(w) we get 



P(T(u) < Mi + ty/u) = E/j 



^ = t/(T( M )) - 3>)/i 

iU(T(u)) - T{u) 
~ cri 3 / 2 ^ ' 

and, since U(T(u)) = u + Z, this can be written 

iu -T(u) + iZ 



X 



Since Z remains bounded, Z/ y/u can be neglected when u — »■ oo and we finally 
get the Gaussian approximation 

T{u) - ul « -ai 3 / 2 * 
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under the measure Pr and hence 

_ [T{u)—ui -o/o \ , 
Pr ( < <r?' x J « 

Finally we get the corresponding formula for the measure P, 

with Cr = limu^oo Eijfe - ^]. The value of the constant Cr can be deduced 
from the Cramer-Lundberg approximation r(u) = P{T(u) < oo) ~ Ce~ Ru (see 
Section 3.2.5). If we let ir -> oo we sec that C R = C = (c - g' (0)) / (g' (R) - c). 
The asymptotic variance of T(u) is hence uPa 2 = ug" (R) / (g' (R) — c) 3 . 



3.2.10 An interesting property of a composite system 

Let us collect the approximate formulas for r(u) and T(u) as follows. The 
exponent R and the time i are determined by c = g (R) /R and i — l/(g'(R) — c). 
The approximate time of ruin is T = ut = u/(g'(R) — c) and, if we define 
C = (c - g'(0))/(g'(R) - c), then r(u_) w r(u,i) w Ce'^ for t > f. This 
means that, if our planning horizon is T, then the probability of ruin, r(u), is a 
reasonable approximation for the finite time ruin probability r(u,t) iit>T. If 
ruin happens it takes place for T{u) w T. 

Let us now consider a system consisting of two independent pieces so that S(t) = 
S\(t) + S2(t) with Si(t) and S^i) independent, and hence g(£) = <7i(£) + g 2 {0- 
It is interesting to compare the quantities of the pieces to those of the total 
system. It they have the same R, we get 



c M 
R 

gi(R) g 2 (R) 
R R 

and, if they have the same T, we obtain 



u = f(g'(R)-c) 

= f(g[(R)-c 1+ g 2 (R)-c 2 ) 

= Ul+U 2 . 

If we use these C; and Ui , we get 

r(u) w Ce- flu 

w ri(ui)r 2 (u 2 ), 
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since, from the fact that 



_ c -^(0) + C2 - g /(Q) 
g^R) - Cl + g' 2 (R) - c 2 ' 

it follows that C\ < C\C 2 jC < C 2 if C\ < C2, that is, the constants are 
comparable. 

There is hence a natural decomposition of c and u into C1+C2 and ui+112, so that 
if we have a common T and i?, then r(u) w ri («i)r , 2('U2), which is the probability 
that both systems are ruined. None of the systems is so to speak unnecessarily 
safe compared to the other. This is also an example of decentralized planning: 
In order to calculate r(u) the central actuary only has to give the values of R 
and T to the local actuaries who can then calculate r\ (u) and r 2 (u) and return 
them, and r(u) rts ri(u)r 2 {u). 
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4 Summary of the formulas 

In this section we give a concise summary of the formulas that have been derived 
in the notes. 

The individual risk model 

X = total amount of loss 

= J2i x iMi, where {Mi} are Bernoulli variables with P(Mi = 1) = Pi = 1—qt- 

Moments: E[A] = J2i x iPi 

Var(X) = 2>?Pi* 

Generating function: E [e^ x ] = J\ i (qi + pie^ Xi ) 

Compound Poisson approximation: 

5 = J2i x iNi, where {Ni} are Poisson variables with e~ Ai = qi 

Generating function: E[e^ s ] = exp {^\ \ (e^ Xi — l)} 

= e»(« , where g{£) = £ t A, (e« Xi - l) 

The collective risk model 

S{t) = total amount of loss in (0, t) 
N{t) = number of accidents in (0, t) 
Xi = the losses in the accidents 

s{t) = Y^ {t) x i 

{N(t)} is a Poisson process with E[N(t)\ = Xt. 

{Xi} are i.i.d. with distribution F(dx), E[X,] = it, E[A 4 2 ] = i/. 

The distribution of S(t) is F(t,dx) = P(S(t) € da;). 

Generating function: E[e ?s «] = e* 9 ^ with g(£) = X J °° (e^ - l) F(dx). 

Moments: E[S(t)] = tg'(0) = tX/i 

Vax(S(t)) = tg"(0) = t\v. 

Panjer-recursion for the density of S{t) 

Assume that Xi have a discrete distribution with P(X{ = nd) = f n . Then 
P(S(t) = md) = g m are given by the recursion 

/ mg m = Xt Y<i nf n g m - n , m = 1, 2, . . . 

Approximations of P(S(t) > tx) 
Entropy function: h(x) = max^jx^ — .9(0} 

= x ix - g(£x), with £ x defined by g'(£ x ) = x. 

The functions g(£) and h(x) are convex. 
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We have g(£) — max^j^x — h(x)} 

= £,X(: — h(x^), with defined by h(x^) = £. 

The functions x = </(£) and £ = h'(x) are inverses of each other. 



Chernoff 's bound: 



{ 



P(S(t) > tx) < e~ th W if x > A^; 
P\s\t) < tx) < e~ th W if x < 



Esscher's approximation: 



The Esscher transform of F(dx) is F a (dx) = e ax F(dx)/f(a) with /(a) = f™ e ax F(d 
E a [e**] = /(£ + a)//(a) 



The transform of F(t, dx) is P a (S(t) G dx) = F a (t, dx) = e ax - ta{a *> F{t, dx). 

Moments: E a [S(t)} =tg'(a) 

Var (S(t)) - tg"(a) 



with x — g'(a) > Xfi — g'(0), a > 0. This is valid for a continuous distribution. 
For a discrete distribution with span d, the factor a is changed into A(d) = 
(1 - e- ad )/d. 

Ruin probabilities 

U (t) = S(t) — ct = net amount of loss in (0, t) 
u = initial capital 

T(u) = min{t; U(t) > u} = time of ruin 
T(-u) = min{i; U(t) = -u} 

r(±u, t) = P(T(±u) <t) = ruin probabilities in finite time 
r(±u) — P(T(±u) < oo) = ruin probabilities in infinite time 

For u = 0, we have the explicit formula 



E a je^M] = e *(s(«+ a )-9( a )) = e *9a(£) 



Esscher'i 



s approximation tells us that 



P{S{t) > tx) 



e -th(x) 



y/2^a^Jtg"{a) 



P(T(0) > t, S(t) e dx) = (l-^j F(t, dx) 



and hence 



P(T(0)>t) = l-r(0,t) 
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The distribution of T(-u) 

For ct > u > we have P(T(-u) G dt) = (u/ct)F(t, cdt - u). Hence 



and 

r(— u) = 




X — cs — u. 



If c > A/U, we have r(—u) = 1. 

The distribution of T(u) 
Seal's formula: 

r(u,t)= / F(t,dx)+ / F(s,u + cds)f(0,i-s) 

Jtt+ct JO 

where f(0, t — s) = 1 — r(0, t — s). 
Cramer's formula for r(u) 

The upcrossings {£4} are i.i.d. with r := P(U\ > 0) = Xfi/c if c > Xfi, and f/i 
has the conditional density k(u) = (1 — F(u))/[m, given that C/i > 0. 

The density of Z7 = max t > f/(t) is l(u) = (1 - r) ^ r m k m *{u) and we have 

r(u) = P(C7 > u) 
= / 

J u 

Panjer-approximation of r{u) 

Approximate the density k(u) by a discrete one with masses {k n } for u = rid, 
n = 1,2,.... Then the corresponding approximation {/„} for l(u), u = nd, can 
be calculated by the iteration 

J l n = r(kil n -i + ... + k n l Q ) for n > 1; 
1 Jo = 1 - r. 

The ruin probability r(u) is approximated by r n — lm f° r u = nd- 
Cramer-Lundberg's approximation of r(u) 

For c > Xu, let i? be the positive root of the equation g(R) = cR and define 
C=(c-g>(0))/(g>(R)-c). Then 

r(u) w Ce~ Ru as u ->• oo. 
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Similarly, for c < AjU, we have r(—u) = e Ru , where R is the negative root of 
g(R) = cR. 

Approximation of r(u, t) 

For c > Xfj,, define T — ut — u/(g'(R) - c). Then T(u) w T if T(u) < oo. More 
accurately, if = tft(c + l/t), then 

P(T(u) < ut|T(u) < oo) < C -i e -«W*)-«) for t < t 

and 

P(ut < T(u) < oo\T(u) < oo) < (7-i e -«(ff(*)-«) f or t > i 

and P = min t H(t) = H(t). Similarly, for c < Xjj,, if wc define T = ut = 
u/(c - g'(R)) and H(t) = th(c - 1/t), wc have 

P(T(-u) < ut\T{-u) < oo) < e -«(»(*)+«) for i < t 

and 

P(ut < T(-u) < oo|T(-u) < oo) < e -«( ff (*)+«) for i > t 
and -P = min t H(t) = H(t). 

Interpretation of the transformed distribution of S(t) 
When c > AjU, the transformed distribution 

F R {t,dx) = P R (S(t)edx) 

= e Rx - tg ^F(t,dx) 

is equal to lim u _>oo P(S(t) G dar|T(u) < oo) and the corresponding result holds 
for T{—u) when c < Xfi. Hence we have 

E[S(t)\T(u) < oo] -> E^S^)] = i 9 '(P) as u -> oo. 

This explains the formula for T, because E^[J7(t)] = t(g'(R) — c) so T is that 
value of f for which this is equal to u. 

The central limit theorem for T(u) 
When u — ^ oo we have 

with C = (c - 5 '(0))/( 5 '(P) - c), t = l/(. 9 '(i?) - c) and a 2 = ,g"(i?). 
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5 Notes and references 



The notes on risk theory by Harald Cramer from 1930 [3] still form a very 
readable introduction to the subject. The idea of using Lemma [3~T1 - the so called 
ballot theorem - to derive the formulas for ruin probabilities is developed by 
Lajos Takacs in [6]. Hopefully our treatment is more understandable. The use 
of tools from large deviation theory to derive asymptotic estimates is developed 
by the author in [5]. An alternative way of studying T(u), which allows a central 
limit theorem to be proved is developed by Bengt von Bahr in [2]. A modern 
and comprehensive treatment of the theory of ruin probabilities is given in [1] 
and [4]. 
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